Abstract
Background: Graft-versus-host disease (GVHD) remains the leading cause of non-relapse morbidity and mortality in allogeneic hematopoietic-stem-cell transplantation (allo-HSCT) recipients. Clinical scoring systems (e.g., EBMT, HCT-CI, DRI, etc.) can stratify risk but show modest predictive performance (C-statistics 0.49–0.67) when it comes to GVHD. To address this gap, we developed GVHD-Intel 1.0, a real-time, scalable, and interpretable machine learning (ML) framework from an international study to predict both acute (aGVHD) and chronic GVHD (cGVHD), providing individual and bulk predictions using common input variables and HLA alleles.
Methods: This retrospective global study, approved by respective institutional review boards, included electronic health records from three tertiary care centers in Abu Dhabi, UAE, and King Hussein Cancer Center (KHCC), Jordan, from 2009 to 2023.
The model was developed in phases. The UAE399 cohort (399 HSCT recipients) served as the baseline dataset. To enhance model generalizability, 55 adult allo-HSCT recipients from KHCC were added to the training set. Two distinct test cohorts were used: UAE46 (46 allogeneic HSCT recipients from UAE) and KHCC150 (150 adult allogeneic HSCT recipients from KHCC), collected later and independently of training data.
Variables included donor type, sex mismatch, disease status, time from diagnosis to transplant, and agents used in conditioning and GVHD prophylaxis (e.g., TBI, ATG, alemtuzumab use etc.). HLA alleles (A, B, C, DQ, DR) were symbolically encoded and transformed via one-hot encoding. GVHD events were classified as aGVHD (<100 days) or cGVHD (≥100 days). Missing values were excluded, and class imbalance was handled via class weighting.
GVHD-Intel 1.0, co-developed with MBZ University of Artificial Intelligence, features modular ensemble architecture, drug-level feature engineering, novel HLA encoding for low-resolution compatibility scoring, and built-in interpretability using SHAP and LIME. The framework is cloud-deployable for both real-time and batch predictions.
Results: The GVHD-Intel 1.0 was externally tested on two independent, temporally and institutionally distinct cohorts (UAE46 and KHCC150), demonstrating strong and consistent predictive performance for both acute and chronic GVHD.
For cGVHD prediction, the model achieved an AUC (area under the curve or C-statistic for binary outcomes) of 0.832, F1 score of 0.896, accuracy of 82.7%, PPV of 0.930, sensitivity of 0.864, Brier score of 0.178, and log loss of 0.543.
For aGVHD, performance metrics included an AUC of 0.802, F1 score of 0.860, accuracy of 77.6%, PPV of 0.930, sensitivity of 0.799, Brier score of 0.196, and log loss of 0.581.
Model calibration remained strong across both tasks, with Brier scores <0.20.
Conclusions: GVHD-Intel 1.0 is a robust, interpretable machine learning framework developed from real-world, multicenter data. By integrating widely available clinical variables and HLA allele data, it outperforms conventional clinical risk scores in predicting both acute and chronic GVHD in allo-HSCT recipients. Slightly reduced performance in aGVHD prediction is likely due to the lack of relevant variables. Next steps include prospective validation across multiple transplant centers globally and continued refinement through expanded data integration. GVHD-Intel 1.0 holds strong potential as a clinically valuable, scalable decision-support tool that can be enacted widely and easily by practicing transplant physicians worldwide in all types of settings.